[Day29]：評估生成模型優劣指標—LPIPS & PSNR & SSIM

2023 iThome 鐵人賽

DAY 29

AI & Data

生成式AI到底何方神聖？一窺生程式AI的真面目系列第 29 篇

15th鐵人賽深度學習模型評估指標生成式ai

golucky_sir

2023-10-02 00:23:47

12615 瀏覽

分享至

前言

今天要來介紹一些可以評估生成模型生成圖片其品質的一些指標，要將人的美感、感知等換成數學公式非常困難。所以不只生成模型在進步，許多評估指標也都在慢慢進步。今天要先介紹LPIPS、PSNR、SSIM指標。後兩者實作非常非常非常地簡單，LPIPS實作需要花費一些時間。不過有了這些指標能夠給予各位在評估生成模型優劣時很大的幫助喔！

這些指標可以幹嘛？

首先我們先來簡述一下常用的圖像生成指標，這些指標很多都是用來評估GAN生成圖片的品質，不過其實對於所有生成模型都是適用的，不同的指標有不同的優缺點，例如某些指標專注計算有無該出現的特徵，但卻不注重特徵與特徵之間的空間關係。例如：只注意圖片中有沒有眼睛，但沒注意到眼睛該生在哪裡，造成眼睛生在嘴巴下面也被評估為好圖片。所以通常實驗或者研究中會使用許多指標來更加完善的判斷圖片的生成能力！

這兩天要來介紹不同指標，這些指標也可以幫助各位在評估自己的生成模型時能夠給予一個量化的標準。

峰值訊噪比 (Peak Signal-to-Noise Ratio, PSNR)

PSNR是用來評估生成模型生成圖片的品質指標，該公式計算兩張照片像素值的均方誤差 (MSE)，接著透過下列公式計算PSNR，其中x為生成圖片；y為真實圖片，這裡x跟y不能反過來，因為右側要計算 $MAX_y$ ，也就是真實圖片y其所有像素中的最大值；C為色彩通道數量，因為對於有RGB 3個色彩通道的圖片來說，MSE就需要再除以3，接著整理一下公式把分母的除號放到分子即可。

通常訊噪比適用於訊號處理的計算方式，但在圖像訊號的計算也行的通，經過PSNR計算後的數值其單位可視作分貝 (dB)，其中分貝數約30~50dB時，圖片的差異就較難分辨，低於30dB時可以很容易用肉眼看出圖片的差異。

PSNR計算出來的分貝數越高，代表生成圖片越接近原始圖片。

如何實作PSNR

實作PSNR非常簡單，只需要使用TensorFlow內建的算法就好了！程式碼如下，img1跟img2都是一張shape為(圖片數量, 圖片寬, 圖片高, 3)的圖片。注意圖片的shape要相同，出來的PSNR shape為(圖片數量, )，因為PSNR是將img1與img2中每張圖片對應去比較的。當圖片完全一樣PSNR計算結果會變成無限大，因為此時MSE為0，雖然在數學上這並沒有意義，但程式會將之計算為無限大 (因為可能計算出來會變成非常非常小，趨近於0的數)，或者也有可能出現除數不可為0的錯誤。

import tensorflow as tf
psnr = tf.image.psnr(img1, img2, max_val=1)
print('PSNR between img1 and img2 is:', psnr)

結構相似性指數 (Structural Similarity Index, SSIM)

SSIM顧名思義是用來計算兩張圖片之間其結構的相似性，這個指標他考慮了圖片的亮度、對比度與結構三個因素，並加以計算圖片失真的程度。比起PSNR，SSIM計算出的結果會更符合人類的感知。SSIM具有對稱性，不像KL散度不具對稱性，所以對於一張圖片a與圖片b，SSIM(a,b)=SSIM(b,a)。

接著來看看SSIM如何計算這三個因素：

亮度：亮度的公式如下，μ代表平均值；σ代表標準差； $C_1$ 為一個用於維持公式穩定的常數。
對比度：對比度的公式如下，σ代表標準差； $C_2$ 也為一個用於維持公式穩定的常數。
結構：結構的公式如下， $\sigma_{xy}$ 代表x與y的共變異數； $C_3$ 也為一個用於維持公式穩定的常數。

綜上所述，整個SSIM的公式如下，其中αβγ都為大於0的數，用於調整亮度、對比度與結構相對重要性的參數。

如果兩張圖片完全一樣的話，則SSIM(x,x)=1，也就是說當SSIM的結果越接近1的話則代表圖片越接近真實圖片。

如何實現SSIM

SSIM的實現跟PSNR一樣非常簡單，只需要使用TensorFlow內建的算法就好了！程式碼如下，img1跟img2都是一張shape為(圖片數量, 圖片寬, 圖片高, 3)的圖片。注意圖片的shape要相同，出來的SSIM shape為(圖片數量, )，因為SSIM是將img1與img2中每張圖片對應去比較的。

import tensorflow as tf
ssim = tf.image.ssim(img1, img2, max_val=1)
print('SSIM between img1 and img2 is:', ssim)

Learned perceptual image patch similarity (LPIPS)

LPIPS是CVPR2018的其中一項研究，CVPR是電腦視覺等領域中非常優秀的研討會，每年都有產出大量最先進的研究成果。話說回來，LPIPS裡面其中一個概念其實出現在SRGAN中，當時我們講到SRGAN會計算感知損失 (Perceptual Loss)，用於衡量兩張圖片之間特徵的差別，感知損失的部分可以複習一下[D22：SRGAN]的部分。LPIPS又比PSNR與SSIM更符合人類的感知情況 (畢竟用了深度學習模型)，他的公式如下：

LPIPS公式。圖源於原始論文方程式 (1)。

這個公式計算 $x$ 與 $x_0$ 的LPIPS距離 ( $d(x,x_0)$ )，從類神經網路模型的 $l$ 層中提取圖片特徵；H & W分別是該層特徵的高與寬度；接著在色彩通道的維度中做單位正規化 (Unit Normalize)，接著再計算MSE。

LPIPS的值越低代表兩張圖片越相似。

如何實現LPIPS

LPIPS使用一個深度學習模型來計算圖片特徵，其使用上有點複雜，首先要先注意各個檔案的位址，相對路徑要完全一樣，否則會出現錯誤：

utils
- lpips
  - net-lin_alex_v0.1.pb
  - lpips_tf.py
main.py

詳細路徑如上圖，其中lpips_tf.py是固定的，因為我找不到函式庫的安裝方式，所以直接在Github上翻出來這個程式，這段程式碼千萬不能更動！程式碼我會附在最下面。

接著會看到一個pb檔案，這是在這邊下載的，檔案名稱跟圖片展示的一樣就好，下載檔名為：net-lin_alex_v0.1.pb。

main.py就是主要程式使用的部分啦，程式碼如下。圖片的shape都是以 (圖片數量, 圖片寬, 圖片高, 色彩通道)為輸入格式，在計算LPIPS之前請注意圖片的shape。基本上程式也是固定的~直接使用即可。

def get_LPIPS_distance(true_img, fake_img):
    import tensorflow.compat.v1 as tf  # 用tf1.x版本的功能
    tf.disable_v2_behavior()  # 禁用2.x版本的功能
    from utils.lpips import lpips_tf
    # true_img & fake_img為np array型態 shape=(batch,w,h,channel)
    image0_ph = tf.convert_to_tensor(true_img, dtype=tf.float32, name="my_tensor")
    image1_ph = tf.convert_to_tensor(fake_img, dtype=tf.float32, name="my_tensor")
    distance_t = lpips_tf.lpips(image0_ph, image1_ph, model='net-lin', net='alex')

    with tf.Session() as session:  # tf.Session是tf 1.x版本限定，2.x版本不能使用
        distance = session.run(distance_t, feed_dict={image0_ph: true_img, image1_ph: fake_img})
    return distance
if __name__=='__main__':
		# img1與img2請依照實際情況使用
		print(get_LPIPS_distance(img1, img2))

我使用LPIPS的時候原始碼都是使用TensorFlow1.多版本的，所以需要透過禁用2.多版本的方式來使用1.x版本的，千萬不要降版本，否則整個環境會有許多不相容的問題要調整！！

結語

明天會來介紹一些其他算法，這些指標都是能用在生成圖片質量計算的，也比今天介紹的指標更常用在論文中。各位都可使用這些指標來評估自己生成模型生成圖片的品質喔，希望各位做出來的模型其效能都很優秀！

附錄：lpips_tf.py

import os
import sys

# import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from six.moves import urllib

_URL = 'http://rail.eecs.berkeley.edu/models/lpips'

def _download(url, output_dir):
    """Downloads the `url` file into `output_dir`.

    Modified from https://github.com/tensorflow/models/blob/master/research/slim/datasets/dataset_utils.py
    """
    filename = url.split('/')[-1]
    filepath = os.path.join(output_dir, filename)

    def _progress(count, block_size, total_size):
        sys.stdout.write('\r>> Downloading %s %.1f%%' % (
            filename, float(count * block_size) / float(total_size) * 100.0))
        sys.stdout.flush()

    filepath, _ = urllib.request.urlretrieve(url, filepath, _progress)
    print()
    statinfo = os.stat(filepath)
    print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')

def lpips(input0, input1, model='net-lin', net='alex', version=0.1):
    """
    Learned Perceptual Image Patch Similarity (LPIPS) metric.

    Args:
        input0: An image tensor of shape `[..., height, width, channels]`,
            with values in [0, 1].
        input1: An image tensor of shape `[..., height, width, channels]`,
            with values in [0, 1].

    Returns:
        The Learned Perceptual Image Patch Similarity (LPIPS) distance.

    Reference:
        Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang.
        The Unreasonable Effectiveness of Deep Features as a Perceptual Metric.
        In CVPR, 2018.
    """
    # flatten the leading dimensions
    batch_shape = tf.shape(input0)[:-3]
    input0 = tf.reshape(input0, tf.concat([[-1], tf.shape(input0)[-3:]], axis=0))
    input1 = tf.reshape(input1, tf.concat([[-1], tf.shape(input1)[-3:]], axis=0))
    # NHWC to NCHW
    input0 = tf.transpose(input0, [0, 3, 1, 2])
    input1 = tf.transpose(input1, [0, 3, 1, 2])
    # normalize to [-1, 1]
    input0 = input0 * 2.0 - 1.0
    input1 = input1 * 2.0 - 1.0

    input0_name, input1_name = '0:0', '1:0'

    default_graph = tf.get_default_graph()
    producer_version = default_graph.graph_def_versions.producer

    cache_dir = os.path.expanduser('./utils/lpips')
    os.makedirs(cache_dir, exist_ok=True)
    # files to try. try a specific producer version, but fallback to the version-less version (latest).
    pb_fnames = [
        '%s_%s_v%s_%d.pb' % (model, net, version, producer_version),
        '%s_%s_v%s.pb' % (model, net, version),
    ]
    for pb_fname in pb_fnames:
        if not os.path.isfile(os.path.join(cache_dir, pb_fname)):
            try:
                _download(os.path.join(_URL, pb_fname), cache_dir)
            except urllib.error.HTTPError:
                pass
        if os.path.isfile(os.path.join(cache_dir, pb_fname)):
            break

    with open(os.path.join(cache_dir, pb_fname), 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        _ = tf.import_graph_def(graph_def,
                                input_map={input0_name: input0, input1_name: input1})
        distance, = default_graph.get_operations()[-1].outputs

    if distance.shape.ndims == 4:
        distance = tf.squeeze(distance, axis=[-3, -2, -1])
    # reshape the leading dimensions
    distance = tf.reshape(distance, batch_shape)
    return distance